{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# **RAG Fusion**\n",
        "RAG-Fusion is an enhanced version of the traditional Retrieval-Augmented Generation (RAG) model. In RAG-Fusion, after receiving a query, the model first generates related sub-queries using a large language model. These sub-queries help find more relevant documents. Instead of simply sending the retrieved documents to the model, RAG-Fusion uses a technique called Reciprocal Rank Fusion (RRF) to score and reorder the documents based on their relevance. The best-ranked documents are then used to generate a more accurate response.\n",
        "\n",
        "Research Paper: [RAG Fusion](https://arxiv.org/pdf/2402.03367)"
      ],
      "metadata": {
        "id": "URDLuILmv7Du"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Initial Setup**"
      ],
      "metadata": {
        "id": "P5PkddQ5wZVS"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QsSAA2QVvfpz"
      },
      "outputs": [],
      "source": [
        "! pip install --q athina langsmith"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "from google.colab import userdata\n",
        "os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
        "os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')\n",
        "os.environ['QDRANT_API_KEY'] = userdata.get('QDRANT_API_KEY')"
      ],
      "metadata": {
        "id": "Ij2kz-gAwdfC"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Indexing**"
      ],
      "metadata": {
        "id": "at5SmuJXwg-G"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# load embedding model\n",
        "from langchain_openai import OpenAIEmbeddings\n",
        "embeddings = OpenAIEmbeddings()"
      ],
      "metadata": {
        "id": "g6Xk1ZJWwhap"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# load data\n",
        "from langchain.document_loaders import CSVLoader\n",
        "loader = CSVLoader(\"./context.csv\")\n",
        "documents = loader.load()"
      ],
      "metadata": {
        "id": "t8pp2bKRwkZC"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# split documents\n",
        "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
        "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
        "documents = text_splitter.split_documents(documents)"
      ],
      "metadata": {
        "id": "X_XTChkywmX0"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Qdrant Vector Database**"
      ],
      "metadata": {
        "id": "uUXoyVQUkEiP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# create vectorstore\n",
        "from langchain_community.vectorstores import Qdrant\n",
        "vectorstore = Qdrant.from_documents(\n",
        "    documents,\n",
        "    embeddings,\n",
        "    url=\"your_qdrant_url\",\n",
        "    prefer_grpc=True,\n",
        "    collection_name=\"documents\",\n",
        "    api_key=os.environ[\"QDRANT_API_KEY\"],\n",
        ")"
      ],
      "metadata": {
        "id": "foUKsehPkPHf"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Chromadb (Optional)**"
      ],
      "metadata": {
        "id": "5sieex4HkN1a"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# # optional vectorstore\n",
        "# !pip install chromadb\n",
        "# # create vectorstore\n",
        "# from langchain.vectorstores import Chroma\n",
        "# vectorstore = Chroma.from_documents(documents, embeddings)"
      ],
      "metadata": {
        "id": "4GM49mhUwoDY"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Retriever**"
      ],
      "metadata": {
        "id": "oKPlGdzwwpuz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# create retriever\n",
        "retriever = vectorstore.as_retriever()"
      ],
      "metadata": {
        "id": "RwZ38y8ZwqNL"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Reciprocal Rank Fusion Chain**"
      ],
      "metadata": {
        "id": "MqS-Hx24wx5d"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# create llm\n",
        "from langchain_openai import ChatOpenAI\n",
        "llm = ChatOpenAI()"
      ],
      "metadata": {
        "id": "mY5nfky7wws-"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# create chain\n",
        "from langchain_core.output_parsers import StrOutputParser\n",
        "from langsmith import Client\n",
        "client = Client()\n",
        "prompt = client.pull_prompt(\"langchain-ai/rag-fusion-query-generation\")"
      ],
      "metadata": {
        "id": "aU-M9VuxxKZN"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# generate queries\n",
        "generate_queries = (\n",
        "    prompt | ChatOpenAI(temperature=0) | StrOutputParser() | (lambda x: x.split(\"\\n\"))\n",
        ")"
      ],
      "metadata": {
        "id": "_hawpPm2xvxi"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# rerank results\n",
        "from langchain.load import dumps, loads\n",
        "\n",
        "def reciprocal_rank_fusion(results: list[list], k=60):\n",
        "    fused_scores = {}\n",
        "    for docs in results:\n",
        "        # Assumes the docs are returned in sorted order of relevance\n",
        "        for rank, doc in enumerate(docs):\n",
        "            doc_str = dumps(doc)\n",
        "            if doc_str not in fused_scores:\n",
        "                fused_scores[doc_str] = 0\n",
        "            previous_score = fused_scores[doc_str]\n",
        "            fused_scores[doc_str] += 1 / (rank + k)\n",
        "\n",
        "    reranked_results = [\n",
        "        (loads(doc), score)\n",
        "        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)\n",
        "    ]\n",
        "    return reranked_results"
      ],
      "metadata": {
        "id": "rf6kiKG0x0lS"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# create chain\n",
        "chain = generate_queries | retriever.map() | reciprocal_rank_fusion"
      ],
      "metadata": {
        "id": "at5rGstCx444"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# check input schema\n",
        "chain.input_schema.schema()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "quMrwq9px8dD",
        "outputId": "024253f5-5d4d-41a8-c218-38d5a8e5f607"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.\n",
            "  and should_run_async(code)\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'title': 'PromptInput',\n",
              " 'type': 'object',\n",
              " 'properties': {'original_query': {'title': 'Original Query',\n",
              "   'type': 'string'}}}"
            ]
          },
          "metadata": {},
          "execution_count": 23
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# rerank results\n",
        "chain.invoke(\"what are points on a mortgage\")"
      ],
      "metadata": {
        "id": "Eql6Fin9yCmr"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **RAG Chain**"
      ],
      "metadata": {
        "id": "5eZKfWqvyUwA"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.schema.runnable import RunnablePassthrough\n",
        "from langchain_core.prompts import ChatPromptTemplate\n",
        "\n",
        "template = \"\"\"Answer the question based only on the following context.\n",
        "If you don't find the answer in the context, just say that you don't know.\n",
        "\n",
        "Context: {context}\n",
        "\n",
        "Question: {question}\n",
        "\"\"\"\n",
        "prompt = ChatPromptTemplate.from_template(template)\n",
        "\n",
        "rag_fusion_chain = (\n",
        "    {\n",
        "        \"context\": chain,\n",
        "        \"question\": RunnablePassthrough()\n",
        "    }\n",
        "    | prompt\n",
        "    | llm\n",
        "    | StrOutputParser()\n",
        ")"
      ],
      "metadata": {
        "id": "n1AJNW1fyUC-"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "rag_fusion_chain.invoke(\"what are points on a mortgage\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 107
        },
        "id": "4VHhYcOVybSI",
        "outputId": "c0b8a967-f595-4804-992c-3b5e644141b1"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.\n",
            "  and should_run_async(code)\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'Points on a mortgage are a form of pre-paid interest that borrowers can offer to pay a lender in order to reduce the interest rate on the loan, resulting in a lower monthly payment.'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 26
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Preparing Data for Evaluation**"
      ],
      "metadata": {
        "id": "mPn0wjWizIMi"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "question = [\"what are points on a mortgage\"]\n",
        "response = []\n",
        "contexts = []\n",
        "ground_truths = [\"Points, sometimes also called a 'discount point', are a form of pre-paid interest.\"]\n",
        "\n",
        "# Inference\n",
        "for query in question:\n",
        "  response.append(rag_fusion_chain.invoke(query))\n",
        "  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])\n",
        "\n",
        "# To dict\n",
        "data = {\n",
        "    \"query\": question,\n",
        "    \"response\": response,\n",
        "    \"context\": contexts,\n",
        "    \"ground_truth\": ground_truths\n",
        "}"
      ],
      "metadata": {
        "id": "ILMCoF4fy3Fu"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# create dataset\n",
        "from datasets import Dataset\n",
        "dataset = Dataset.from_dict(data)"
      ],
      "metadata": {
        "id": "AMapriM_zI7v"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# create dataframe\n",
        "import pandas as pd\n",
        "df = pd.DataFrame(dataset)"
      ],
      "metadata": {
        "id": "9SVP6pbfzNJZ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "df"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 144
        },
        "id": "ZFgRZXk3zTFy",
        "outputId": "59cd67ad-531e-47dd-8d84-5f7695336910"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.\n",
            "  and should_run_async(code)\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                           query  \\\n",
              "0  what are points on a mortgage   \n",
              "\n",
              "                                            response  \\\n",
              "0  Points on a mortgage are a form of pre-paid in...   \n",
              "\n",
              "                                             context  \\\n",
              "0  [context: [\"Discount points, also called mortg...   \n",
              "\n",
              "                                        ground_truth  \n",
              "0  Points, sometimes also called a 'discount poin...  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-ab5037cf-fd88-4d3c-a7c7-75a16c0c284d\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>query</th>\n",
              "      <th>response</th>\n",
              "      <th>context</th>\n",
              "      <th>ground_truth</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>what are points on a mortgage</td>\n",
              "      <td>Points on a mortgage are a form of pre-paid in...</td>\n",
              "      <td>[context: [\"Discount points, also called mortg...</td>\n",
              "      <td>Points, sometimes also called a 'discount poin...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ab5037cf-fd88-4d3c-a7c7-75a16c0c284d')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-ab5037cf-fd88-4d3c-a7c7-75a16c0c284d button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-ab5037cf-fd88-4d3c-a7c7-75a16c0c284d');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "  <div id=\"id_c1189c00-7229-4364-ab59-fcadceae8756\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_c1189c00-7229-4364-ab59-fcadceae8756 button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('df');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df",
              "summary": "{\n  \"name\": \"df\",\n  \"rows\": 1,\n  \"fields\": [\n    {\n      \"column\": \"query\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"what are points on a mortgage\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"response\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"Points on a mortgage are a form of pre-paid interest available in the United States when arranging a mortgage. One point equals one percent of the loan amount. Borrowers can offer to pay points to reduce the interest rate on the loan, thus obtaining a lower monthly payment in exchange for this up-front payment.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"context\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"ground_truth\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"Points, sometimes also called a 'discount point', are a form of pre-paid interest.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Convert to dictionary\n",
        "df_dict = df.to_dict(orient='records')\n",
        "\n",
        "# Convert context to list\n",
        "for record in df_dict:\n",
        "    if not isinstance(record.get('context'), list):\n",
        "        if record.get('context') is None:\n",
        "            record['context'] = []\n",
        "        else:\n",
        "            record['context'] = [record['context']]"
      ],
      "metadata": {
        "id": "_w4dyNk6zWRI"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Evaluation in Athina AI**\n",
        "\n",
        "We will use **Answer Relevancy** eval here. It Measures how pertinent the generated response is to the given prompt. Please refer to our [documentation](https://docs.athina.ai/api-reference/evals/preset-evals/overview) for further details."
      ],
      "metadata": {
        "id": "3t-dhvCTzYZi"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# set api keys for Athina evals\n",
        "from athina.keys import AthinaApiKey, OpenAiApiKey\n",
        "OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))\n",
        "AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))"
      ],
      "metadata": {
        "id": "m1D6qv0-zYII"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# load dataset\n",
        "from athina.loaders import Loader\n",
        "dataset = Loader().load_dict(df_dict)"
      ],
      "metadata": {
        "id": "jNOq_2suzduz"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# evaluate\n",
        "from athina.evals import RagasAnswerRelevancy\n",
        "RagasAnswerRelevancy(model=\"gpt-4o\").run_batch(data=dataset).to_df()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 569
        },
        "id": "mXOmVsq8zhjD",
        "outputId": "00be51d7-a0c9-467e-a130-06b5a3483cb1"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.\n",
            "  and should_run_async(code)\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "evaluating with [answer_relevancy]\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "\r  0%|          | 0/1 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/\n",
            "  warnings.warn('The `dict` method is deprecated; use `model_dump` instead.', category=PydanticDeprecatedSince20)\n",
            "/usr/local/lib/python3.10/dist-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/\n",
            "  warnings.warn('The `dict` method is deprecated; use `model_dump` instead.', category=PydanticDeprecatedSince20)\n",
            "100%|██████████| 1/1 [00:01<00:00,  1.11s/it]\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "You can view your dataset at: https://app.athina.ai/develop/f9377fc3-cd03-4feb-9b52-c8e747590c5b\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                           query  \\\n",
              "0  what are points on a mortgage   \n",
              "\n",
              "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               context  \\\n",
              "0  [context: [\"Discount points, also called mortgage points or simply points, are a form of pre-paid interest available in the United States when arranging a mortgage. One point equals one percent of the loan amount. By charging a borrower points, a lender effectively increases the yield on the loan above the amount of the stated interest rate.  Borrowers can offer to pay a lender points as a method to reduce the interest rate on the loan, thus obtaining a lower monthly payment in exchange for ...   \n",
              "\n",
              "                                                                                                                                                                                                                                                                                                                   response  \\\n",
              "0  Points on a mortgage are a form of pre-paid interest available in the United States when arranging a mortgage. One point equals one percent of the loan amount. Borrowers can offer to pay points to reduce the interest rate on the loan, thus obtaining a lower monthly payment in exchange for this up-front payment.   \n",
              "\n",
              "  expected_response            display_name failed  \\\n",
              "0              None  Ragas Answer Relevancy   None   \n",
              "\n",
              "                                                                                                                                                                                                                                                                  grade_reason  \\\n",
              "0  A response is deemed relevant when it directly and appropriately addresses the original query. Importantly, our assessment of answer relevance does not consider factuality but instead penalizes cases where the response lacks completeness or contains redundant details   \n",
              "\n",
              "   runtime   model  ragas_answer_relevancy  \n",
              "0     1454  gpt-4o                0.919036  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-d40e90aa-7b30-48f5-a0d7-c3ed4b273ccf\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>query</th>\n",
              "      <th>context</th>\n",
              "      <th>response</th>\n",
              "      <th>expected_response</th>\n",
              "      <th>display_name</th>\n",
              "      <th>failed</th>\n",
              "      <th>grade_reason</th>\n",
              "      <th>runtime</th>\n",
              "      <th>model</th>\n",
              "      <th>ragas_answer_relevancy</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>what are points on a mortgage</td>\n",
              "      <td>[context: [\"Discount points, also called mortgage points or simply points, are a form of pre-paid interest available in the United States when arranging a mortgage. One point equals one percent of the loan amount. By charging a borrower points, a lender effectively increases the yield on the loan above the amount of the stated interest rate.  Borrowers can offer to pay a lender points as a method to reduce the interest rate on the loan, thus obtaining a lower monthly payment in exchange for ...</td>\n",
              "      <td>Points on a mortgage are a form of pre-paid interest available in the United States when arranging a mortgage. One point equals one percent of the loan amount. Borrowers can offer to pay points to reduce the interest rate on the loan, thus obtaining a lower monthly payment in exchange for this up-front payment.</td>\n",
              "      <td>None</td>\n",
              "      <td>Ragas Answer Relevancy</td>\n",
              "      <td>None</td>\n",
              "      <td>A response is deemed relevant when it directly and appropriately addresses the original query. Importantly, our assessment of answer relevance does not consider factuality but instead penalizes cases where the response lacks completeness or contains redundant details</td>\n",
              "      <td>1454</td>\n",
              "      <td>gpt-4o</td>\n",
              "      <td>0.919036</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d40e90aa-7b30-48f5-a0d7-c3ed4b273ccf')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-d40e90aa-7b30-48f5-a0d7-c3ed4b273ccf button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-d40e90aa-7b30-48f5-a0d7-c3ed4b273ccf');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "repr_error": "Out of range float values are not JSON compliant: nan"
            }
          },
          "metadata": {},
          "execution_count": 34
        }
      ]
    }
  ]
}